Search CORE

63 research outputs found

GPU-based Acceleration of Symbol Timng Recovery

Author: Bhattacharyya Shurva S.
Cavallaro Joseph R.
Kim Scott C.
Plishker William L.
Publication venue: IEEE
Publication date: 20/12/2012
Field of study

This paper presents a novel implementation of graphics processing unit (GPU) based symbol timing recovery using polyphase interpolators to detect symbol timing error. Symbol timing recovery is a compute intensive procedure that detects and corrects the timing error in a coherent receiver. We provide optimal sample-time timing recovery using a maximum likelihood (ML) estimator to minimize the timing error. This is an iterative and adaptive system that relies on feedback, therefore, we present an accelerated implementation design by using a GPU for timing error detection (TED), enabling fast error detection by exploiting the 2D filter structure found in the polyphase interpolator. We present this hybrid/ heterogeneous CPU and GPU architecture by computing a low complexity and low noise matched filter (MF) while simultaneously performing TED. We then compare the performance of the CPU vs. GPU based timing recovery for different interpolation rates to minimize the error and improve the detection by up to a factor of 35. We further improve the process by utilizing GPU optimization and performing block processing to improve the throughput even more, all while maintaining the lowest possible sampling rate.Laboratory for Telecommunications SciencesNational Science Foundation (NSF

DSpace at Rice University

Dataflow-based Design and Implementation of Image Processing Applications

Author: Bhattacharyya Shuvra S.
Plishker William
Shen Chung-Ching
Publication venue
Publication date: 01/01/2011
Field of study

Dataflow is a well known computational model and is widely used for expressing the functionality of digital signal processing (DSP) applications, such as audio and video data stream processing, digital communications, and image processing. These applications usually require real-time processing capabilities and have critical performance constraints. Dataflow provides a formal mechanism for describing specifications of DSP applications, imposes minimal data-dependency constraints in specifications, and is effective in exposing and exploiting task or data level parallelism for achieving high performance implementations. To demonstrate dataflow-based design methods in a manner that is concrete and easily adapted to different platforms and back-end design tools, we present in this report a number of case studies based on the lightweight dataflow (LWDF) programming methodology. LWDF is designed as a "minimalistic" approach for integrating coarse grain dataflow programming structures into arbitrary simulation- or platform-oriented languages, such as C, C++, CUDA, MATLAB, SystemC, Verilog, and VHDL. In particular, LWDF requires minimal dependence on specialized tools or libraries. This feature --- together with the rigorous adherence to dataflow principles throughout the LWDF design framework --- allows designers to integrate and experiment with dataflow modeling approaches relatively quickly and flexibly into existing design methodologies and processes

CiteSeerX

Digital Repository at the University of Maryland

Advances in Architectures and Tools for FPGAs and their Impact on the Design of Complex Systems for Particle Physics

Author: Bhattacharyya Shuvra
Compton Katherine
Farmahini-Farahani Amin
Gregerson Anthony
Plishker William
Schulte Michael
Xie Zaipeng
Publication venue: CERN
Publication date: 01/01/2009
Field of study

The continual improvement of semiconductor technology has provided rapid advancements in device frequency and density. Designers of electronics systems for high-energy physics (HEP) have benefited from these advancements, transitioning many designs from fixed-function ASICs to more flexible FPGA-based platforms. Today’s FPGA devices provide a significantly higher amount of resources than those available during the initial Large Hadron Collider design phase. To take advantage of the capabilities of future FPGAs in the next generation of HEP experiments, designers must not only anticipate further improvements in FPGA hardware, but must also adopt design tools and methodologies that can scale along with that hardware. In this paper, we outline the major trends in FPGA hardware, describe the design challenges these trends will present to developers of HEP electronics, and discuss a range of techniques that can be adopted to overcome these challenges

CERN Document Server

Multiobjective Optimization for Reconfigurable Implementation of Medical Image Registration

Author: Omkar Dandekar
Raj Shekhar
Shuvra S. Bhattacharyya
William Plishker
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

In real-time signal processing, a single application often has multiple computationally intensive kernels that can benefit from acceleration using custom or reconfigurable hardware platforms, such as field-programmable gate arrays (FPGAs). For adaptive utilization of resources at run time, FPGAs with capabilities for dynamic reconfiguration are emerging. In this context, it is useful for designers to derive sets of efficient configurations that trade off application performance with fabric resources. Such sets can be maintained at run time so that the best available design tradeoff is used. Finding a single, optimized configuration is difficult, and generating a family of optimized configurations suitable for different run-time scenarios is even more challenging. We present a novel multiobjective wordlength optimization strategy developed through FPGA-based implementation of a representative computationally intensive image processing application: medical image registration. Tradeoffs between FPGA resources and implementation accuracy are explored, and Pareto-optimized wordlength configurations are systematically identified. We also compare search methods for finding Pareto-optimized design configurations and demonstrate the applicability of search based on evolutionary techniques for identifying superior multiobjective tradeoff curves. We demonstrate feasibility of this approach in the context of FPGA-based medical image registration; however, it may be adapted to a wide range of signal processing applications

Crossref

Directory of Open Access Journals

Using the DSPCAD Integrative Command-Line Environment: User's Guide for DICE Version 1.1

Author: Bhattacharyya Shuvra S.
Plishker William
Sane Nimish
Shen Chung-Ching
Zaki George
Publication venue
Publication date: 01/01/2011
Field of study

This document provides instructions on setting up, starting up, and building DICE and its key companion packages, dicemin and dicelang. This installation process is based on a general set of conventions, which we refer to as the DICE organizational conventions, for software packages. The DICE organizational conventions are specified in this report. These conventions are applied in DICE, dicemin, and dicelang, and also to other software packages that are developed in the Maryland DSPCAD Research Group

CiteSeerX

Digital Repository at the University of Maryland

The DSPCAD Lightweight Dataflow Environment: Introduction to LIDE Version 0.1

Author: Bhattacharyya Shuvra S.
Cho Inkeun
Kim Scott
Plishker William
Shen Chung-Ching
Wang Lai-Huei
Won Stephen
Publication venue
Publication date: 21/10/2011
Field of study

LIDE (the DSPCAD Lightweight Dataflow Environment) is a flexible, lightweight design environment that allows designers to experiment with dataflow-based approaches for design and implementation of digital signal processing (DSP) systems. LIDE contains libraries of dataflow graph elements (primitive actors, hierarchical actors, and edges) and utilities that assist designers in modeling, simulating, and implementing DSP systems using formal dataflow techniques. The libraries of dataflow graph elements (mainly actors) contained in LIDE provide useful building blocks that can be used to construct signal processing applications, and that can be used as examples that designers can adapt to create their own, customized LIDE actors. Furthermore, by using LIDE along with the DSPCAD Integrative Command Line Environment (DICE), designers can efficiently create and execute unit tests for user-designed actors. This report provides an introduction to LIDE. The report includes details on the process for setting up the LIDE environment, and covers methods for using pre-designed libraries of graph elements, as well as creating user-designed libraries and associated utilities using the C language. The report also gives an introduction to the C language plug-in for dicelang. This plug-in, called dicelang-C, provides features for efficient C-based project development and maintenance that are useful to apply when working with LIDE

CiteSeerX

Digital Repository at the University of Maryland

Recommended from our members

Graphics Processing Unit–Accelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations

Author: Hata Nobuhiko
Olubiyi Olutayo
Plishker William
Shekher Raj
Silverman Stuart George
Tatli Servet
Tokuda Junichi
Torabi Meysam
Zaki George
Publication venue: 'Elsevier BV'
Publication date: 14/09/2017
Field of study

Rationale and Objectives: Accuracy and speed are essential for the intraprocedural nonrigid MR-to-CT image registration in the assessment of tumor margins during CT-guided liver tumor ablations. While both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique based on volume subdivision with hardware acceleration using a graphical processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. Materials and Methods: Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD)), and total processing time including contouring of ROIs and computation were compared using a paired Student’s t-test. Results: Accuracy of the GPU-accelerated registrations and B-spline registrations, respectively were 88.3 ± 3.7% vs 89.3 ± 4.9% (p = 0.41) for DSC and 13.1 ± 5.2 mm vs 11.4 ± 6.3 mm (p = 0.15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 s vs 557 ± 116 s (p < 0.000000002), respectively; there was no significant difference in computation time despite the difference in the complexity of the algorithms (p = 0.71). Conclusion: The GPU-accelerated volume subdivision technique was as accurate as the B-spline technique and required significantly less processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice

Harvard University - DASH

Graphics Processing Unit–Accelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations

Author: Tokuda Junichi
Plishker William
Torabi Meysam
Olubiyi Olutayo
Zaki George
Tatli Servet
Silverman Stuart George
Shekher Raj
Hata Nobuhiko
Publication venue: Elsevier BV
Publication date: 28/10/2016
Field of study

University of Zadar Institutional Repository

Harvard University - DASH

Croatian Digital Thesis Repository

Np-click: A programming model for the intel ixp1200

Author: Kurt Keutzer
Niraj Shah
William Plishker
Publication venue: Morgan Kaufmann
Publication date: 01/01/2003
Field of study

The architectural diversity and complexity of network processor architectures motivate the need for a more natural abstraction of the underlying hardware. In this paper, we describe a programming model, NP-Click, which makes it possible to write efficient code and improve application performance without having to understand all of the details of the target architecture. Using this programming model, we implement the data plane of an IPv4 router on a particular network processor, the Intel IXP1200, and compare results with a hand-coded implementation. Our results show the IPv4 router written in NP-Click performs within 7 % of a hand-coded version of the same application using a realistic packet mix. 1

CiteSeerX

Applied Math and Science Education Repository